Ambient Sound Provides Supervision for Visual Learning

نویسندگان

  • Andrew Owens
  • Jiajun Wu
  • Josh H. McDermott
  • William T. Freeman
  • Antonio Torralba
چکیده

The sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning

The sound of crashing waves, the roar of fastmoving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, throu...

متن کامل

Learning to Localize Sound Source in Visual Scenes

Visual events are usually accompanied by sounds in our daily lives. We pose the question: Can the machine learn the correspondence between visual scene and the sound, and localize the sound source only by observing sound and visual scene pairs like human? In this paper, we propose a novel unsupervised algorithm to address the problem of localizing the sound source in visual scenes. A two-stream...

متن کامل

Question Answering through Transfer Learning from Large Fine-grained Supervision Data

We show that the task of question answering (QA) can significantly benefit from the transfer learning of models trained on a different large, fine-grained QA dataset. We achieve the state of the art in two well-studied QA datasets, WikiQA and SemEval-2016 (Task 3A), through a basic transfer learning technique from SQuAD. For WikiQA, our model outperforms the previous best model by more than 8%....

متن کامل

Evaluating non-speech sound visualizations for the deaf

Sounds such as co-workers chatting nearby or a dripping faucet help us maintain awareness of and respond to our surroundings. Without a tool that communicates ambient sounds in a nonauditory manner, maintaining this awareness is difficult for people who are deaf. We present an iterative investigation of peripheral, visual displays of ambient sounds. Our major contributions are: (1) a rich under...

متن کامل

Oil spill detection using in Sentinel-1 satellite images based on Deep learning concepts

Awareness of the marine area is very important for crisis management in the event of an accident. Oil spills are one of the main threats to the marine and coastal environments and seriously affect the marine ecosystem and cause political and environmental concerns because it seriously affects the fragile marine and coastal ecosystem. The rate of discharge of pollutants and its related effects o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016